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ABSTRACT 


Infectious disease caused by infection of Mycobacterium tuberculosis is 
called tuberculosis (TB). A common method in detecting TB is by identifying 
number of mycobacterium TB in sputum manually. Unfortunately, manually 
calculation by pathologists take a relatively long time. Previous researches on 
TB bacteria were still limited to detect the absence or presence of 
mycobacterium TB in images of sputum. This research aims are identifying 
number of mycobacterium TB and determining accuracy of classification TB 
severity by approaching nonparametric Poisson regression model and 
applying an estimator namely local linear. Steps include processing of image, 
reducing of dimension by applying partial least square and discrete wavelet 
transformation, and then identifying the number of mycobacterium TB by 
using the proposed model approach. In this research, we get deviance values 
of 28.410 for nonparametric and 93.029 for parametric approaches and 
the average of classification accuracy values for 4 iterations of 92.75% for 
nonparametric and 85.5% for parametric approaches. Thus, for identifying 
many of mycobacterium TB met in images of sputum and classifying of TB 
severity, the proposed identifying method gives higher accuracy and shorter 
time in identifying number of mycobacterium TB than parametric linear 
regression method. 
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1, INTRODUCTION 


Mycobacterium tuberculosis can cause a direct infectious disease namely tuberculosis (TB). 
Identification of TB through microscopic observation (screening) using sputum smear samples has greatly 
helped prevent TB disease [1-7]. But, the process of identifying TB through microscopic screening requires 
a long time, high accuracy, and expert laboratory personnel. As a result of the long identification process 
that requires high accuracy, statistical modeling and software assistance are needed to identify TB disease 
from sputum samples of patients using processing of images. This process is one of processing digital images 
that is a discipline of study about digitally techniques to proceed images [8-10]. 

Studies related to the identification of TB from sputum images have been done by several 
researchers. Researchers [11-13] used meta analysis, [14] used self organizing map and [15] used 
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the learning vector quantization (LVQ) method which gave accuracy of 91.33%. The study by [16] used 
neural network methods that gave accuracy rate of 77.5%, [17] used intelligence system, [18] used automatic 
scanning microscope, [19] used spatial domain filter, [20] studied by using artificial neural network (ANN) 
and Bayesian, [21, 22] used support vector mechine (SVM) method, [23] used deep learning and [24] used 
the gaussian fuzzy neural network (GFNN) method that gave accuracy rate of 91.38%. Yet, the previous 
researchers detected the presence or absence of mycobacterium TB in sputum images of patients only. Thus, 
they have not identified how many TB bacteria which are in sputum images of patients. Also, they have not 
classified tuberculosis severity. 

The previous detection of TB has been done manually which needs long time. In this research, 
we propose a statistical model approach that can shorten the time of counting TB bacteria. In statistical model 
approach, we can model the numbers of TB bacteria from patients of TB in regression models. There are 
parametric regression and nonparametric regression models. Estimation of the regression functions in these 
regression models have been studied by [25, 26]. Nonparametric regression functions are only assumed to be 
smooth, 1.e., continuous and differentiable functions so that they are very flexible to determine the regression 
functions [25]. There are several estimators for estimating nonparametric regression function. Local linear is 
one of them, that is, one of smoothing techniques in nonparametric regression and a specific case of local 
polynomial smoothing technique [27, 28]. 

In statistical modelling we can use the locally weighted maximum likelihood method for estimating 
function of regression at the observation points [28]. However, parameter estimation of Poisson regression 
by using the maximum likelihood method cannot be solved directly. It takes a Newton-Raphson iteration 
procedure. The Newton-Raphson method is one of the iterative methods used to solve equations that cannot 
be solved directly because they are not linear in parameters [28]. In addition, some statistical models have 
been used for modelling diseases data [29-35]. Therefore, in this research, we propose a statistical model 
approach called Poisson additive nonparametric regression model using local linear estimator to identify how 
many TB bacteria that are in sputum images of TB patients. Usually, dimension of sputum image is very 
large. Hence, we use discrete wavelet transformation (DWT) and partial least square (PLS) methods to 
reduce dimension of images. 


2. RESEARCH METHOD 

We use secondary data of 100 images of TB sputum that consist of 75 in-samples data and 25 
out-samples data. The steps of research include processing of image, reducing of dimension, identifying 
the number of bacteria by using nonparametric and parametric regressions approaches, and classifying of 
tuberculosis severity. 


2.1. Processing of image 

We need this step to upgrade quality of image for exploring more information about TB bacteria 
contained by TB sputum based on its image in order to the next stages of image processing are easier. 
These stages are started by process of reading TB sputum image data file, process of gray-scaling, process 
of thresholding, process of histogram equalization, and process of resizing image. Figure 1 shows pictures 
of these stages. Next, results of resizing image process are replaced into a matrix where columns of matrix 
represent predictors and rows of matrix represent observations. 





(d) (e) 


(b) 





Figure 1. The stages of image processing (a) Image of TB sputum, (b) Grayscale, (c) Threshold, 
(d) Histogram equalization, (e) Resized image 


2.2. Dimension reduction using DWT and PLS 
Tuberculosis sputum image is obtained from ZNSM-IDB (Ziehl-Neelsen sputum smear microscopy 
image database) which can access by 14.139.240.55/znsm. For each image, tuberculosis sputum image has 
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a very large size. Because of this fact we meet difficulties in calculating data. Hence, we need reduce 
dimension to smaller dimension. In this research, to reduce dimension we use DWT method. This method 
gives result which closed to the origin variable and able to handle data with high dimension. However, this 
method still remains multicolinearity between variables in the model, because mathematically we cannot 
warrant that the correlation coefficients between these variables are relatively small. Therefore, we need PLS 
method to overcome this multicolinearity, because this method produces new mutually independent 
variables. The following steps are needed in reducing dimension: 


a. Defining matrix A for a sample size Nn and q = 2" fora positive integer M; 


(nxq) 


b. Determining an orthogonal wavelet matrix W 


(qxq) ° 
c. Determining wavelet coefficient matrix D that is transformation result of W in step b; 
od * = T eee 
d. Determining m (<7) that follows Doin) = X cnx) Wigxm) by Substituting zero values into the ( 


m-+1)-th column to the q -th column; 


x 


e. Determining matrix D that is a correlation matrix to check co-linearity; 


(nxm) 

f. Determining best number of components based on the percentage of variance and the value of root mean 
square error of prediction (RMSEP); 

g. Determining the optimal number of latent vectors based on the RMSEP plot; 

h. Calculating the optimal component X-score selected to determine the number of predictor variables after 
being reduced. 

Based on steps from (a) to (h), we reduce 2048 predictor variables to 5 predictor variables. 


2.3. Estimate the number of bacteria using nonparametric and parametric regressions 
We conduct the following steps to identify the data: 
— Testing the number of Mycobacterium tuberculosis data (Y) with Poisson distribution; 
— Determining optimal bandwidths for every predictor using cross validation (CV) criterion; 
— Estimating regression function based on obtained optimal bandwidth by using locally weighted maximum 
likelihood and Newton-Raphson iteration methods; 
— Providing plots of observations data (Y) and estimation results ( 1 ); 


— Testing the suitability of the estimated model by using statistics testing of deviance; 
- Analyzing and giving interpretation to the estimated model that has been obtained. 


3. RESULTS AND DISCUSSION 

Based on 100 observations, we have 2048 predictor variables that is reduced to 5 predictor variables 
through image processing by using DWT and PLS methods. These reduced predictors will be modeled for 
estimating the number of Mycobacterium tuberculosis in the sputum image. Firstly, by using parametric 
Poisson regression model approach we estimate the number of Mycobacterium tuberculosis. To reach it, 
we estimate parameters of model, do simultaneously and individually significant testing, estimate the number 
of bacteria, and calculate accuracy in each observation. Values of estimation are given in Table 1. 


Table 1. Values of estimation for parametric Poisson regression 


Predictor Coefficient SE-coefficient. Z P 

Intercept 2.607 0.032 80.679 0.000 
XI 0.044 0.033 1.350 0.177 
X2 2.526 0.402 6.272 0.000 
X3 2.294 0.533 4.299 0.000 
X4 1.633 0.541 3.017 0.002 
X5 0.963 0.453 2.125 0.033 


In this research, for each observation we obtain the estimated value LL as follows: 


pees +2.526 X, +2.294 X, +1.633 X4 +0.963 X; ) ( 1 ) 


U, = 


The second step is identifying the number Mycobacterium tuberculosis by using local linear 
estimator of nonparametric Poisson regression model. To obtain the estimated regression function, it is 
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necessary to determine the optimal bandwidth (A) using CV criterion. Plots of CV versus bandwidth values 


for every predictor variable are given in Figures 2-6 and the optimal bandwidths and minimum CV values for 
every predictor variable are given in Tables 2. 
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Table 2. The optimal bandwidths and minimum CV 
values for every predictor variable 


27.58 


S Predictor Optimal bandwidth (h) Minimum CV value 
a X; 0.546 47.98752 

5 X, 0.36 21.52445 

3 27.54 X; 0.043 25.46812 

= X4 0.081 27.92228 

> a752 X 0.099 27.48833 

v 

2 

O 
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Figure 6. Plot of cross validation values of 
bandwidths for predictor 5 (X5) 


Then, we use values given in Table 2 to get estimated model for every observation, that is h. 


For example, on the 26" observation we estimate [be model as follows: 
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Lb, = exp{2.486 —0.264(X, — 0.347) +1.960(X, — 0.140) + 3.331CX, + 0.143) + 
3.224(X, + 0.033) + 1.345(X, —0.154)} 


where X, e (X, —0.546, X, +0.546); X, <(X, —0.36, X, +0.36); X, € (X, — 0.043, X, + 0.043) 
X, e(X, —0.081, X, +0.081); X, € (X, — 0.099, X, + 0.099) . 


Estimation plots of the number of Mycobacterium tuberculosis by using parametric Poisson linear 
regression and nonparametric Poisson regressions based on local linear estimator for 75 observations are 
given together in Figure 7. Next, we compare the estimation results between parametric Poisson linear 
regression and nonparametric Poisson regression approaches using local linear estimator based on goodness 
of fit criterion that is minimum deviance value. These results are shown in Table 3. 
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Figure 7. Plots of estimation results using parametric Poisson linear regression and nonparametric Poisson 
regressions approaches 


Table 3. Deviance values of parametric and nonparametric regressions approaches 
Model Deviance 
Parametric Poisson linear regression 93.029 
Nonparametric Poisson local linear regression 28.410 


Table 3 shows that deviance value of nonparametric Poisson local linear regression model approach 
is less than that of parametric Poisson linear regression model approach. It points out that in this case, 
use of nonparametric Poisson local linear regression model is more suitable to model and analyze the data 
than that of parametric Poisson linear regression. This fact is also supported by the deviance testing results 
of the Poisson regression with linear parametric model approach and the Poisson regression with 
nonparametric model approach by using local linear estimator for in-sample data. The deviance testing result 
of the Poisson regression by using linear parametric model approach shows that the deviance value of 93.029 


is greater than Chi-square Tis value of 90.5312 with significance level 5%. It means that statistically, 


the Poisson regression by using linear parametric model approach is not suitable to model and analyze 
the data. In contrary, the deviance testing result of nonparametric Poisson regression model approach by 


using local linear estimator shows that the deviance value of 28.410 is less than Chi-square Vos value 


of 90.5312 with significance level 5%. Therefore, the nonparametric Poisson local linear regression model 
approach is more appropriate to model and analyze the data than the Poisson linear parametric regression 
model approach. 

Furthermore, after identifying the number of Mycobacterium tuberculosis we also can classify 
tuberculosis severity based on the IUATLD scale. The result of the accuracy of tuberculosis severity 
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classification is 95%. This result indicates that the obtained model is valid for classifying the severity 
of tuberculosis. The classification accuracy values of parametric Poisson linear regression and nonparametric 
Poisson local linear regression models approaches for 4 iterations are shown in Table 4. Table 4 shows that in 
four iterations, average of classification accuracy values of nonparametric Poisson regression model 
approach by using local linear estimator is 92.75%, and average of classification accuracy values of 
parametric Poisson linear regression model approach is 85.5%. It means that the average of classification 
accuracy of suffering level of tuberculosis by using nonparametric Poisson regression model approach based 
on local linear estimator is better than that by using parametric Poisson linear regression model approach. 


Table 4. Classification accuracy values of parametric Poisson linear regression and nonparametric Poisson 
local linear regression models for four iterations 
Iteration Classification accuracy 
Nonparametric regression model Parametric regression model 


1 95% 85% 
2 91% 86% 
3 92% 87% 
4 93% 84% 
Average 92.75% 85.5% 


4. CONCLUSION 

Based on average of classification accuracy values and deviance value, for identifying the number 
of Mycobacterium tuberculosis, the Poisson regression by using nonparametric Poisson regression model 
approach based on local linear estimator is better than parametric Poisson linear regression model approach. 
Thus, the proposed identifying method gives higher accuracy and shorter time in identifying number 
of mycobacterium TB than parametric Poisson linear regression method. 
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